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Abstract 

In this paper, we argue that existing concepts for the 
design and implementation of network stacks for con¬ 
strained devices do not comply with the requirements of 
current and upcoming Internet of Things (loT) use cases. 
The loT requires not only a lightweight but also a modu¬ 
lar network stack, based on standards. We discuss func¬ 
tional and non-functional requirements for the software 
architecture of the network stack on constrained loT de¬ 
vices. Then, revisiting concepts from the early Internet 
as well as current implementations, we propose a future- 
proof alternative to existing loT network stack architec¬ 
tures, and provide an initial evaluation of this proposal 
based on its implementation running on top of state-of- 
the-art loT operating system and hardware. 

1 Introduction 

The Internet of Things (loT) promises a future where all 
machines have started talking to one another, including 
billions of cheap, tiny, programmable, communicating 
devices (aka Things) such as wired or wireless sensors, 
and actuators. Based on various types of low-cost micro¬ 
controllers and communication chips, those devices will 
significantly increase heterogeneity within the Internet. 

The past has shown that the success of the Internet 
depends on the availability of network stack(s) that al¬ 
low for flexible composition of standards and enabling a 
wide variety of optional features to fit heterogeneous use 
cases. 

The design and implementation of networks stacks 
challenged system engineers since the very beginning 
of computer networking. At that time computers ex¬ 
hibited severe hardware resources constraints, similar to 
loT devices nowadays in terms of memory (kBytes in¬ 
stead of GBytes), and in terms of CPU (Mflops instead 
of Gflops). However, in contrast to the 80s and early 
90s, the heterogeneity and thus the set of options and 


scenarios is much larger in the loT, which increases com¬ 
plexity ||20]|7]|22][l0l. Furthermore, Moore’s Law does 
not apply to microcontrollers, and thus, such tiny de¬ 
vices will remain prevalent in the future 0. Since full- 
featured systems such as Linux cannot be accommodated 
on such tiny devices, novel solutions are needed. 

In this paper, we argue to revisit the design and im¬ 
plementation space of network stacks for constrained 
devices. Recent operating systems (e.g., RIOT lEl) 
support Linux-like functions but comply with the hard¬ 
ware constraints of loT hardware, which gives potential 
to build flexible network stacks with low memory foot¬ 
print. 

We target class 1 devices or bigger, i.e., devices 
with at least 10 kByte of RAM and a few tens of kByte 
of ROM. We believe that for even more constrained de¬ 
vices, there is no way around specialized, simplihed, and 
highly optimized implementations. Therefore, note that 
our goal is not to engineer the most memory-efficient net¬ 
work stack but to design a clean, structured, and univer¬ 
sal network stack that can be reused for many different 
loT use cases, while still being able to cope with con¬ 
strained environments. In detail, our contributions are as 
follows: 

1. we identify functional and non-functional require¬ 
ments for the software architecture of the network 
stack on loT devices (see Section]^, 

2. we analyze existing loT network stack architec¬ 
tures, from a systems point of view (see Section]^, 

3. we propose an alternative architecture which we ar¬ 
gue is more future-proof than existing architectures, 
because easier to use for continuous extensions, to 
conhgure for different loT use cases, leveraging 
cleaner interfaces and newly available loT operat¬ 
ing system services (see Section|^, 

4. we provide initial evaluation of the proposed archi¬ 
tecture and show it complies with typical resource 
constraints of loT hardware (see Section]^. 
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2 Assumptions & Design Objectives 

As the complexity of software for embedded devices has 
increased over the last decade, it has become state-of- 
the-art to use operating systems even on memory and 
CPU constrained machines, such as loT devices. A full- 
featured network stack is one of the most complex pieces 
of software to run on an embedded platform. By full- 
featured, we refer to a stack allowing for a complete im¬ 
plementation of the specifications per design. This point 
is especially important, as one can easily simplify parts 
of an implementation at the price of limiting the extent 
of completeness that this implementation can achieve in 
the end. In the following, we will make the assumptions 
listed below. 

Multi-Process & Hardware Independence. We as¬ 
sume that the network stack is built atop such an 
OS that provides the following features: (i) support 
of threads/processes, (ii) a lightweight process model, 
(iii) efficient inter-process communication (IPC), (iv) 
lightweight hardware abstraction, (v) a clean driver 
model, and (vi) a memory foot-print suitable for loT de¬ 
vices. Assumptions (i)-(iii) allow for a modular network 
stack that is split over multiple processes without a sig¬ 
nificant overhead through administrative data structures 
and run-time drawbacks. Assumptions (iv) and (v) en¬ 
able the network stack to be independent from specific 
loT hardware platforms and network devices. We also 
assume that the OS allows the network stack to be open 
source, maintained by a lively community (similarly to 
Linux). 

It is worth noting that our assumptions are reasonable; 
the operating system RIOT ifTSi matches all of them and 
thus allows us a proof of concept of our proposed archi¬ 
tecture. 


2.1 High-Level Objective & Approach 

The usual approach to deal with the heterogeneity of em¬ 
bedded systems in the loT (i.e. hardware constraints, use 
cases) is to implement multiple network stacks — each 
designed for a specific setup. While this yields optimized 
results for a small group of scenarios, there are draw¬ 
backs: multiple implementations vastly increase efforts 
for implementation, testing, maintenance, and incur ex¬ 
tra efforts to ensure interoperability. 

We thus pursue a different approach: we aim for a sin¬ 
gle, full-featured network stack that is flexible enough to 
work in a broad range of loT scenarios, while still be¬ 
ing efficient and small enough to run on constrained and 
battery-driven devices. In the following, we break down 
the various aspects of this high-level objective. 


2.2 Functional Requirements 

Focus on IPv6. The network stack should enable 
end-to-end connectivity between loT devices and any 
other Internet device. IPv 6 is a good candidate for this 
functionality, together with the 6 L 0 WPAN suite of IP 
protocols for low-power lossy networks (including RPL, 
UDP, CoAP etc.). Note that that our design should also 
easily apply to other layered network stacks. For this 
reason we will focus, but not exclusively, on IPv 6 . 

Full-featured. We aim for a full-featured network 
stack in a sense that supported protocols should imple¬ 
ment their specifications completely as a long-term goal. 
The point is to prohibit design decisions which will limit 
future extensions of an implementation. The rationale 
behind this is to allow for a generic solution, which can 
be tailored to fit various use cases, instead of a solution 
that is too specific by design. 

Support for multiple network interfaces. loT sce¬ 
narios do not only include basic sensors with a micro¬ 
controller and a single low-power radio, but also bor¬ 
der routers with multiple interfaces (e.g., Ethernet and 
IEEE 802.15.4) as well as upcoming loT devices, which 
are likely to have multiple radio interfaces (e.g., IEEE 
802.15.4 and Bluetooth). Thus, the network stack must 
be able to handle multiple network interfaces, and we 
argue that, if designed carefully, the overhead of multi¬ 
interface support is negligible compared to single inter¬ 
face support, even on constrained devices. 

Parallel data handling. Most embedded network 
stacks achieve their small memory footprint by reducing 
their functionality, to the point where they are only able 
to handle a single network packet at a time. While this 
might be reasonable in some use cases, this is unrealistic 
in general. In particular, using IPv 6 over spontaneous 
wireless networking, multiple services run in parallel, 
e.g., both routing and neighbor discovery protocols are 
tightly coupled to data transfers between nodes. Thus, 
the network stack must be able to handle multiple pack¬ 
ets and data streams in parallel. 

2.3 Non-functional Requirements 

Open Standards and tools. Decades of experience 
with the Internet indicate that deployment success de¬ 
pends on (open) standards. To achieve future-proof inter¬ 
operability despite heterogeneity amongst loT devices, 
the network stack must be standard compliant. Hetero¬ 
geneity is not only found in loT hardware but also in de¬ 
velopment environments and processes. We argue that a 
standard network stack should only depend on open tools 
and standard paradigms (e.g. ANSI C) to allow easy in¬ 
tegration. Exotic tools and programming languages be¬ 
come a fatal hurdle on the way to reaching the critical 
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mass of developers necessary to develop and maintain 
in the long run a piece of software as sophisticated as a 
network stack (a typical example of this phenomenon is 
TinyOS with the nesC language ifTSl ) 

Configurability. The objective is the design of a 
versatile network stack that can be adapted to a variety 
of loT scenarios. However, the granularity of configu¬ 
ration should avoid too many configuration options that 
have unclear meaning and effects (and thus are only us¬ 
able for experts). Key configuration parameters must be 
well documented and accessible from a central point to 
achieve a user-friendly and flexible solution. 

Extensibility via clean interfaces. Clean interfaces 
yield two important advantages. First, it focuses mod¬ 
ules on their core functionality, thus preventing entan¬ 
gled code. Second, it yields testability by design. Fur¬ 
thermore, modules and clean interfaces enable substitu¬ 
tion of parts of the network stack, which can easily be 
tailored according to the loT scenario. For example, it is 
straightforward to switch between two different imple¬ 
mentations of a neighbor cache, one being optimized for 
mn-time performance using a heap data structure, and 
another being optimized for memory efficiency using a 
simple circular list. However, again, the granularity of 
modules should remain coarse enough to avoid the pit- 
falls of ultra-fragmented code, which quickly becomes 
unmanageable, as analyzed in El. 

Low memory footprint. While we do not aim for 
the smallest possible memory footprint (we have other 
goals, as stated above), we aim for very limited re¬ 
sources. For a concrete upper bound we aim for a max¬ 
imum of 30Kb of ROM and 10Kb of RAM for a sin¬ 
gle interface configuration running 6 L 0 WPAN, RPL and 
UDP These target numbers align well with the available 
resource on class 1 devices 0 , which we expect to one 
of the most significant classes of loT devices in the near 
future. 

Low-power design. Many loT devices are expected 
to run for years on small batteries. Experience shows 
that optimizations for low-power are harder to add on, 
and thus should built-in by design, from the very begin¬ 
ning. This has mainly two consequences: (i) the design 
of the network stack must allow to easily vary the proto¬ 
cols used in different scenarios, as best suited, and (ii) the 
implementation must use efficient data-structures and al¬ 
gorithms allowing maximum sleep intervals for the CPU. 

3 Related Work: Existing Network Stacks 

Today’s Internet is unthinkable without Linux/Unix and 
their network stacks, successors of the BSD 4.4 net¬ 
work stack ||8l |25l . Although they were originally devel¬ 
oped in times when the memory constraints of a typical 
computer were roughly comparable with that of current 


loT devices, their development followed fundamentally 
different design objectives, focusing predominantly on 
throughput (this manifests itself e.g., in the way buffers 
are designed). Over the years this lead to a drastically in¬ 
creased memory footprint and made it inconceivable to 
run or port these stacks to loT devices. 

Over the last decade, and even more since 6 L 0 WPAN 
has evolved, a number of network stacks have been de¬ 
veloped specifically for embedded devices. One category 
of stacks are ultra-minimalistic implementations, such as 
the work by Santos et al. ||9l, which - by design - are not 
extensible and cannot become a full-featured IP stack. 
Thus, they do not meet the requirements from Section]^ 
Various other stacks, as presented by several surveys, can 
be roughly be put in three groups (i) discontinued, (ii) 
proprietary and closed-source, or (iii) open-source and 
freely available ETI1^ 1^ . In the following we will 
focus on the third group (the analyzed requirements dis¬ 
qualifies the others). 

Sensinode’s open NanoStack 1.1 El was superseded 
by the proprietary implementation of NanoStack 2.0, 
thus does not satisfy the requirements we derived in Sec¬ 
tion | 2 l 

A number of relevant network stacks were based on 
TinyOS El- However, since they were using TinyOS’ 
exotic programming language nesC, they do not match 
the requirements from Section Additional, we argue 
that due to the high complexity of TinyOS ’ system design 
and therefore limited number of available developers (as 
analyzed by P. Levis ifTSl l. it is very unlikely that devel¬ 
opment of these stacks will keep up with the evolution of 
new loT protocols. 

An interesting approach towards a fully configurable 
network stack for embedded environments was proposed 
with CiAO/IP 151. However, it does not match the de¬ 
rived requirements for similar reasons as the stacks for 
TiniyOS, since it is based on an exotic C-H- dialect and 
an exotic compiler. Moreover, the intended granularity 
of configurability is too fine grained to be manageable 
by most application developers. 

The two most prominent embedded network stacks to¬ 
day are uIP Ca and IwIP HU- Both were developed at 
the same time by the same author as pure IPv4 stacks. 
Over time uIP has evolved from being developed as a 
stand-alone network stack to being maintained as the de¬ 
fault network stack of the Contiki operating system, sup¬ 
porting a full 6 L 0 WPAN protocol stack lfT3l fT4ll . The 
stack does however not support multiple network inter¬ 
faces and is further based on an event loop paradigm. 
This makes it hard to program for a typical programmer 
experienced in traditional networking applications and 
more difficult to implement several protocols and mecha¬ 
nisms from the TCP/IP suite M- The IwIP stack is sim¬ 
ilar being developed over time, IPv 6 support being re- 
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cently added. For use in the loT IwIP is missing support 
for 6 L 0 WPAN. Although both stacks can be conhgured 
to a good extent, they are missing clear documentation 
and interfaces for easy extensibility. For these reasons 
we see both stacks failing to comply to the derived re¬ 
quirements. 

4 General Architecture 

The key design rule for the proposed network stack soft¬ 
ware architecture is a strict module-driven design. We 
emphasize especially on a clean dehnition of the inter¬ 
faces between these software modules as this ensures in¬ 
terchangeability of modules (i.e. to choose from differ¬ 
ent implementations for different scenarios) and interop¬ 
erability of these modules. In this section we will give a 
brief overview on the most relevant design decisions. 

4.1 Modular Design 

The top level of the software architecture consists of a 
number of high-level modules, one for each functional 
entity of the network stack, for example UDP, IPv 6 , 
6 L 0 WPAN, or RPL. The novelty of the proposed archi¬ 
tecture is that each high-level module is executed in its 
own thread while each module services the same API uti¬ 
lizing the operating systems IPC. The unihed interfaces 
allows for chaining multiple modules together and the 
concept is comparable to Unix STREAMS, as proposed 
in the 80s ll23l . with the difference that we transferred 
the STREAMS concept to work via IPC. 

Figure [T] illustrates as network stack configuration 
with three devices. The netapi depicts the unihed IPC 
API between the high-level modules. Although each 
of these modules can roughly be mapped to layers of 
the TCP/IP model, the architecture does not enforce this 
mapping. 

This design allows for a very Hexible conhguration of 
modules (even at run-time if needed) and, as important, 
it enables a straight-forward extension by new features 
or adding other layers. During design and implementa¬ 
tion of modules this design enables further a clear sep¬ 
aration of concerns and it enables for efficient testing 
of the modules. Using a unihed IPC API yields further 
benehts when adding integrated network devices into the 
system that include already parts of the protocol stack, 
like Texas Instrument’s CC3000 which already provides 
a full TCP/IP stack |1|. For a given network interface 
that e.g. already includes a full IP implementation one 
simply needs to write a host-side device driver that can 
service netapi and make it known to one or more trans¬ 
port layer modules. 

One might argue that IPC comes with a high price 
w.r.t. run-time performance and therefore energy usage. 
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Figure 1: Sample conhguration of a network stack. Each 
box depicts a high-level module running in it’s own 
thread. 


However, our measurements using RIOT on state-of-the- 
art loT hardware (a 32-bit ARM Cortex-M3 platform) 
show that sending a message from one thread to another, 
including context save, running the scheduler and context 
restore, requires a number of CPU cycles that is only one 
order of magnitude more compared to the number of cy¬ 
cles needed for a direct function call. The benehts thus 
outweigh this overhead because (i) packet throughput on 
loT devices is typically low, and (ii) there are few layers 
going up the stack, typically yielding IPC on less than 4 
occasions. 


4.2 Inter-module Communication: netapi 

We introduce a unihed interface for communication 
between high-level modules called netapi. This in¬ 
terface is built around a small set of messages sent 
between the modules utilizing the operating systems 
IPC. The idea behind this interface is that every 
layer in the network stack services an identical in¬ 
terface. The core of the netapi interface is a mini¬ 
mal set of messages, of which the most essential are 
WRITEJTATA, REGISTER RECEIVE CA LL BACK, and 
SET_ and GETjOPTION. As each module must be able 
to parse the general format of netapi messages, it can 
implement any subset of possible message types and 
reply with an ENOTSUP (’’Operation not supported”, 
POSIX.1-2001) for all other message types H]. 
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4.3 Driver Interaction: netdev 

The proposed architecture introduces a second unified 
interface for communication between device driver and 
medium access control (MAC) protocols, called netdev. 
In contrary to the netapi interface this API is based on 
direct function calls instead of IPC. The practical reason 
for introducing a second interface at this stage are the 
tight timing constraints of MAC protocols (e.g. schemes 
based on TDMA). Using the netdev API allows (i) for in¬ 
dependent implementations of device drivers and MAC 
protocols and (ii) for better re-use and exchangeability 
of both, subsequently increasing the portability. 


4.4 Packet Buffering 

A key issue to solve in the design of a network stack for 
constrained devices is the handling of buffers for user 
data and protocol headers, as these are stored in RAM 
being one of the most limited resources. Typical design 
choices for these buffers include centralized approaches, 
copying data from module to module as well as mixed 
concepts. The data handling in the proposed network 
stack is designed around a ’copy twice’ concept. Outgo¬ 
ing data is copied once from the user application (socket) 
into a central buffer and once into a network interface’s 
device buffer by the device driver. The same is true for 
receiver data, which is copied on arrival once from the 
network interface into the central buffer and once more 
when handed over to an application. 

The central packet buffer is designed as a central mod¬ 
ule accessible from all high-level modules through a well 
defined API. The buffers task is to centrally provision 
memory for storing header and user data while it is pass¬ 
ing through the network stack, either as packets in one 
piece or as fragments. By accessing the packet buffer 
though a defined interface, it is further possible to trans¬ 
parently exchange the packet buffers implementation at 
compile time, e.g. one that manages a fragment of stati¬ 
cally allocated memory against an implementation using 
dynamic memory on the heap. 

The major advantages of a central buffer are (i) flex¬ 
ibility, (ii) efficiency through less data copying and (iii) 
the possibility to globally define the (maximum) amount 
of memory used. A drawback of a buffer taking chunks 
of data in different sizes is fragmentation, but we argue 
that with efficient implementation this disadvantage is 
marginal. By including means of prioritization for mem¬ 
ory allocations in the packet buffers API, we can further 
make sure that no network module is being starved by 
missing buffer space, thus removing the major source for 
dead-locks. 


5 Preliminary Evaluation 

We implemented a proof of concept of our approach for 
the operating system RIOT ||3l . To illustrate the principle 
feasibility we present the required amount of memory. 
Note that the values are still subject to optimization. 

Our evaluation is based on a simple configuration us¬ 
ing UDP, 6 L 0 WPAN, and a single IEEE802.15.4 net¬ 
work interface built for the IoT-LAB_M3 hardware El. 
Table [T] shows the ROM usage for relevant modules of 
the network stack. Our modular network stack, which is 
based on common programming techniques and system 
calls, requires less than 30 kByte of ROM and is thus in 
line with loT resources. 


Module 

IEEE 

802.15.4 

6L0WPAN 
and IPv6 

UDP 

Socket 

API 

Helper 

Functions 

Bytes 

1,112 

15,708 

886 

1,280 

2,530 


Table 1; Preliminary code size of main network stack 
modules on an IoT-LABJVI3 node (ARM Cortex-M3) 

The RAM usage is mainly driven by two factors: (i) 
buffers and (ii) stacks. While the size for the central 
packet buffer is dynamically configurable during com¬ 
pile time, we estimate that networks like IEEE802.15.4 
require less than 1-2 kByte of RAM. The memory con¬ 
sumed by stacks is dependent on the number of high- 
level network modules that are configured. In our setup, 
we use one thread per network function (i.e., UDP, IP, 
6 L 0 WPAN, and the link-layer). With a default stack size 
of 1 kByte for ARM Cortex-M3 platforms, this estimates 
to an additional memory usage of 4 kByte. Overall, the 
required RAM size complies with the target platforms 
(i.e., < 10 kByte). 

6 Conclusion & Outlook 

In this paper, we questioned the applicability of current 
network stack solutions for the Internet of Things (loT). 
Eollowing the observation that several loT scenarios in¬ 
troduce constrained devices but do not require an ulti¬ 
mate memory-efficient network stack, we elaborate the 
design space and introduce a software architecture for a 
modular, full-featured network stack. Our proof of con¬ 
cept is based on a common system environment and re¬ 
quires < 10 kBytes of RAM and < 22 kBytes of ROM. 
Our next steps will be to complete our implementation 
for the open source operating system RIOT and explore 
the limits of our concept in different loT scenarios. 
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